Barcode identification for single cell genomics

نویسندگان

  • Akshay Tambe
  • Lior Pachter
چکیده

Single-cell sequencing experiments use short DNA barcode ‘tags’ to identify reads that originate from the same cell. In order to recover single-cell information from such experiments, reads must be grouped based on their barcode tag, a crucial processing step that precedes other computations. However, this step can be difficult due to high rates of mismatch and deletion errors that can afflict barcodes. Here we present an approach to identify and error-correct barcodes by traversing the de Bruijn graph of circularized barcode k-mers. This allows for assignment of reads to consensus fingerprints constructed from k-mers, and we show that for single-cell RNA-Seq this improves the recovery of accurate single-cell transcriptome estimates. Availability and implementation Freely available source code is available at Github: https://github.com/pachterlab/Sircel This Github repository also contains iPython notebooks to reproduce all analysis presented in this paper. Introduction Tagging of sequencing reads with short DNA barcodes is a common experimental practice that enables a pooled sequencing library to be separated into biologically meaningful partitions. This technique is in the cornerstone of many single-cell sequencing experiments, where reads originating from individual cells are tagged with cell-specific barcodes; as such, the first step in any single-cell sequencing experiment involves separating reads by barcode to recover single-cell profiles (Svensson et al., 2017; Trapnell, 2015); (Klein et al., 2015). For example, in the Drop-Seq protocol, which is a popular microfluidic-based single-cell experimental platform, DNA barcodes are synthesized on a solid bead support, using split-and-pool DNA synthesis (Macosko et al., 2015). Similar split-and-pool barcoding strategies are used in other single-cell sequencing assays such as Seq-Well (Gierahn et al., 2017) and Split-seq (Rosenberg et al., 2017). One consequence of this synthetic technique is that deletion errors are extremely prevalent; by some estimates 25% of all barcode sequences observed contain at least one deletion (Macosko et al., 2015). Ignoring such errors can therefore dramatically lower the number of usable reads in a dataset, while incorrectly grouping reads together can confound single cell analysis. Current approach to “barcode calling”, the process of grouping reads together by barcode, use simple heuristics to first identify barcodes that are likely to be uncorrupted, and then “error correct” remaining barcodes to increase yields. However the complex . CC-BY-NC 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/136242 doi: bioRxiv preprint first posted online May. 9, 2017;

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Turning single cells into microarrays by super-resolution barcoding.

In this review, we discuss a strategy to bring genomics and proteomics into single cells by super-resolution microscopy. The basis for this new approach are the following: given the 10 nm resolution of a super-resolution microscope and a typical cell with a size of (10 µm)(3), individual cells contain effectively 10(9) super-resolution pixels or bits of information. Most eukaryotic cells have 1...

متن کامل

Title: Targeting individual cells by barcode in pooled sequence libraries Authors: Navpreet Ranu, Alexandra-Chloé Villani, Nir Hacohen, and Paul C. Blainey

There is rising interest in applying single-cell transcriptome analysis and other single-cell sequencing methods to resolve differences between cells. Pooled processing of thousands of single cells is now routinely practiced by introducing cell-specific DNA barcodes early in cell processing protocols ​. However, researchers must sequence a large number of cells to sample rare subpopulations ...

متن کامل

Title: Targeting individual cells by barcode in pooled sequence libraries Authors: Navpreet Ranu, Alexandra-Chloé Villani, Nir Hacohen, and Paul C. Blainey

There is rising interest in applying single-cell transcriptome analysis and other single-cell sequencing methods to resolve differences between cells. Pooled processing of thousands of single cells is now routinely practiced by introducing cell-specific DNA barcodes early in cell processing protocols ​. However, researchers must sequence a large number of cells to sample rare subpopulations ...

متن کامل

Massively parallel multiplex DNA sequencing for specimen identification using an Illumina MiSeq platform

Genetic information is a valuable component of biosystematics, especially specimen identification through the use of species-specific DNA barcodes. Although many genomics applications have shifted to High-Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies, sample identification (e.g., via DNA barcoding) is still most often done with Sanger sequencing. Here, we present ...

متن کامل

Identifying structural variants using linked-read sequencing data

Motivation Structural variation, including large deletions, duplications, inversions, translocations, and other rearrangements, is common in human and cancer genomes. A number of methods have been developed to identify structural variants from Illumina short-read sequencing data. However, reliable identification of structural variants remains challenging because many variants have breakpoints i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017